[query] Truncate search results to prevent OOM on jaeger-query. #1202

annanay25 · 2018-11-20T18:32:44Z

Resolves #1051.

Changes:

Added upper bound on number of traces to fetch from a Cassandra backend.
In addition to upper bound on traces, added upper bound on number of spans to fetch from an ES backend.

Question: Do we need a flag to mark the result as truncated on the UI?

Signed-off-by: Annanay [email protected]

yurishkuro

Thanks for the PR.

yurishkuro · 2018-11-24T04:37:35Z

plugin/storage/cassandra/spanstore/reader.go

@@ -237,7 +237,10 @@ func (s *SpanReader) FindTraces(ctx context.Context, traceQuery *spanstore.Trace
 	}
 	if traceQuery.NumTraces == 0 {
 		traceQuery.NumTraces = defaultNumTraces
+	} else if traceQuery.NumTraces > 10000 {
+		traceQuery.NumTraces = 10000


why this hardcoded upper limit? It should be user-configurable.

Sure. Also -

Going through the current traceQueryParameters()
https://github.com/jaegertracing/jaeger/blob/master/cmd/query/app/query_parser.go#L139 - NumTraces seems to be defined as defaultQueryLimit but in its implementation it's used when NumTraces is not passed explicitly - https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/cassandra/spanstore/reader.go#L238

Should this be renamed?

^ This is fine. My bad, confused the two.

annanay25 · 2018-11-28T08:16:17Z

So I'm a little confused. Each query can already be parameterised with a limit on number of traces - https://github.com/jaegertracing/jaeger/blob/master/cmd/query/app/query_parser.go#L139 (if not specified then a default is used) - doesn't this resolve #1051? If no, then why not?

yurishkuro · 2018-11-28T15:38:53Z

The original ticket was about individual traces with large number of spans (we've seen traces with tens of millions of spans).

annanay25 · 2018-11-30T11:14:57Z

@yurishkuro OK.

In that case, I will drop the following changes to plugin/storage/cassandra/spanstore/reader.go

	if traceQuery.NumTraces == 0 {
		traceQuery.NumTraces = defaultNumTraces
	} else if traceQuery.NumTraces > 10000 {
		traceQuery.NumTraces = 10000
        }

But will retain the following TerminateAfter(defaultNumSpans) -

searchRequests[i] = elastic.NewSearchRequest().IgnoreUnavailable(true).Type(spanType).Source(elastic.NewSearchSource().Query(query).Size(defaultDocCount).TerminateAfter(defaultNumSpans).Sort("startTime", true).SearchAfter(nextTime))

Should defaultNumSpans be user configurable? This would a change in the parameter list of the function multiRead().

codecov · 2018-12-09T07:15:29Z

Codecov Report

Merging #1202 into master will decrease coverage by 0.06%.
The diff coverage is 58.33%.

@@            Coverage Diff             @@
##           master    #1202      +/-   ##
==========================================
- Coverage     100%   99.93%   -0.07%     
==========================================
  Files         160      160              
  Lines        7181     7189       +8     
==========================================
+ Hits         7181     7184       +3     
- Misses          0        4       +4     
- Partials        0        1       +1

Impacted Files	Coverage Δ
plugin/storage/es/spanstore/reader.go	`100% <100%> (ø)`	⬆️
cmd/query/app/query_parser.go	`94.79% <37.5%> (-5.21%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eed786f...a509e7d. Read the comment docs.

annanay25 · 2018-12-09T07:20:48Z

PR updated.

Need to figure out a way parameterise GetTrace() (would require changes to the spanstore.Reader interface!) with numSpans if a user wants to force retrieval of all spans.

annanay25 · 2018-12-17T10:15:15Z

@yurishkuro could you please re-run Travis on this?

pavolloffay · 2018-12-21T14:36:26Z

There is a git conflict.It needs to be rebased.

Signed-off-by: Annanay <[email protected]>

annanay25 · 2018-12-28T16:45:09Z

@pavolloffay I've rebased and removed conflicts.

Also, if we want to allow users to force retrieval of more than 10k spans for a trace, we'd have to make changes to the spanstore.Reader interface. What's our take on that?

yurishkuro · 2018-12-28T21:16:27Z

cmd/query/app/query_parser.go

@@ -30,12 +30,14 @@ import (

 const (
 	defaultQueryLimit = 100
+	defaultSpansLimit = 10000


there was no default previously, we should not introduce one. I.e. default should be 0, which is == unlimited.

see the very last comment

yurishkuro · 2018-12-28T21:17:11Z

cmd/query/app/query_parser.go


 	operationParam   = "operation"
 	tagParam         = "tag"
 	tagsParam        = "tags"
 	startTimeParam   = "start"
 	limitParam       = "limit"
+	limitSpansParam  = "limitSpans"


s/limitSpans/maxSpansPerTrace/

see the very last comment

yurishkuro · 2018-12-28T21:18:55Z

cmd/query/app/query_parser.go

@@ -105,6 +107,16 @@ func (p *queryParser) parse(r *http.Request) (*traceQueryParameters, error) {
 		limit = int(limitParsed)
 	}

+	limitSpansParam := r.FormValue(limitSpansParam)
+	limitSpans := defaultSpansLimit


scope the string var to the if statement:

var maxSpansPerTrace int if maxSpansParam := r.FormValue(maxSpansPerTraceParam); maxSpansParam != "" { }

yurishkuro · 2018-12-28T21:25:01Z

plugin/storage/es/spanstore/reader.go

@@ -269,7 +269,7 @@ func (s *SpanReader) multiRead(ctx context.Context, traceIDs []string, startTime
 			if val, ok := searchAfterTime[traceID]; ok {
 				nextTime = val
 			}
-			searchRequests[i] = elastic.NewSearchRequest().IgnoreUnavailable(true).Type(spanType).Source(elastic.NewSearchSource().Query(query).Size(defaultDocCount).Sort("startTime", true).SearchAfter(nextTime))
+			searchRequests[i] = elastic.NewSearchRequest().IgnoreUnavailable(true).Type(spanType).Source(elastic.NewSearchSource().Query(query).Size(defaultDocCount).TerminateAfter(numSpans).Sort("startTime", true).SearchAfter(nextTime))


what is the difference between Size() and TerminateAfter()?

yurishkuro · 2018-12-28T21:26:03Z

plugin/storage/es/spanstore/reader.go

@@ -54,6 +54,7 @@ const (

 	defaultDocCount  = 10000 // the default elasticsearch allowed limit
 	defaultNumTraces = 100
+	defaultNumSpans  = 10000


this has no effect currently since multi-read is already bound by defaultDocCount

yurishkuro · 2018-12-28T21:27:46Z

storage/spanstore/interface.go

@@ -51,4 +51,5 @@ type TraceQueryParameters struct {
 	DurationMin   time.Duration
 	DurationMax   time.Duration
 	NumTraces     int
+	NumSpans      int


however, why does this need to be passed from the interface? I think the overall feature is intended to protect storage from OOM, so the parameterization should be done via a parameter when instantiating the storage, not at query time.

Hi @yurishkuro, I think this particular issue was filed for OOM on the jaeger-query nodes, and we have a separate issue for handling overloading of the underlying storage - #960

Fixing it in the storage will address query-service OOM.

OK. Will look into. Closing this PR.

So do we want to restrict number of spans (per trace) at ingestion? @yurishkuro

Not at ingestion (which isn't possible anyway since spans arrive async). We want the limit to be applied at query time, similar to the changes you made to plugin/storage/es/spanstore/reader.go, but the value of the limit to be configurable as a parameter of the query-service, passed at Reader construction from CLI, not as a parameter of the HTTP request.

Sorry its taking so many iterations to get this one right. Understood now, sending in a new, clean PR.

annanay25 requested review from black-adder, jpkrohling, objectiser, pavolloffay, tiffon, vprithvi and yurishkuro as code owners November 20, 2018 18:32

annanay25 changed the title ~~[Storage] Truncate search results to prevent OOM on jaeger-query.~~ [query] Truncate search results to prevent OOM on jaeger-query. Nov 20, 2018

yurishkuro requested changes Nov 24, 2018

View reviewed changes

annanay25 force-pushed the add-numtrace-upperbound branch from 29a998d to f67231c Compare December 9, 2018 07:15

annanay25 force-pushed the add-numtrace-upperbound branch from f67231c to e73dcd4 Compare December 9, 2018 07:16

Annanay added 3 commits December 26, 2018 23:07

[Storage] Truncate search results to prevent OOM on jaeger-query.

2008414

Signed-off-by: Annanay <[email protected]>

Adding num-spans limit

e05a4d4

Signed-off-by: Annanay <[email protected]>

Limit span count in ES queries

a509e7d

Signed-off-by: Annanay <[email protected]>

annanay25 force-pushed the add-numtrace-upperbound branch from e73dcd4 to a509e7d Compare December 26, 2018 17:48

yurishkuro requested changes Dec 28, 2018

View reviewed changes

annanay25 closed this Jan 5, 2019

annanay25 mentioned this pull request Jan 15, 2019

Add CLI configurable MaxNumSpans while retrieving spans from ES. #1283

Merged

ankitnayan mentioned this pull request Jul 27, 2022

Loading a trace detail page with 10K spans consumes all available resource of a host SigNoz/signoz#1432

Closed

yurishkuro mentioned this pull request May 15, 2023

BUG: Remove TerminateAfter from Elasticsearch/Opensearch query resulting in incomplete span count/list #4336

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[query] Truncate search results to prevent OOM on jaeger-query. #1202

[query] Truncate search results to prevent OOM on jaeger-query. #1202

annanay25 commented Nov 20, 2018

yurishkuro left a comment

yurishkuro Nov 24, 2018

annanay25 Nov 24, 2018

annanay25 Nov 24, 2018

annanay25 commented Nov 28, 2018

yurishkuro commented Nov 28, 2018 •

edited

Loading

annanay25 commented Nov 30, 2018

codecov bot commented Dec 9, 2018 •

edited

Loading

annanay25 commented Dec 9, 2018

annanay25 commented Dec 17, 2018

pavolloffay commented Dec 21, 2018

annanay25 commented Dec 28, 2018

yurishkuro Dec 28, 2018

yurishkuro Dec 28, 2018

yurishkuro Dec 28, 2018

yurishkuro Dec 28, 2018

yurishkuro Dec 28, 2018

yurishkuro Dec 28, 2018

yurishkuro Dec 28, 2018

yurishkuro Dec 28, 2018

annanay25 Dec 29, 2018

yurishkuro Dec 29, 2018

annanay25 Jan 5, 2019 •

edited

Loading

yurishkuro Jan 12, 2019

annanay25 Jan 15, 2019

[query] Truncate search results to prevent OOM on jaeger-query. #1202

[query] Truncate search results to prevent OOM on jaeger-query. #1202

Conversation

annanay25 commented Nov 20, 2018

yurishkuro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

annanay25 commented Nov 28, 2018

yurishkuro commented Nov 28, 2018 • edited Loading

annanay25 commented Nov 30, 2018

codecov bot commented Dec 9, 2018 • edited Loading

Codecov Report

annanay25 commented Dec 9, 2018

annanay25 commented Dec 17, 2018

pavolloffay commented Dec 21, 2018

annanay25 commented Dec 28, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

annanay25 Jan 5, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yurishkuro commented Nov 28, 2018 •

edited

Loading

codecov bot commented Dec 9, 2018 •

edited

Loading

annanay25 Jan 5, 2019 •

edited

Loading